Literature references

A collection of intrinsic disorder characterizations from eukaryotic proteomes. - database used in this analysis https://www.ncbi.nlm.nih.gov/pubmed/27326998

Molecular signaling involving intrinsically disordered proteins in prostate cancer. https://www.ncbi.nlm.nih.gov/pubmed/27212129

Sequence- and Structure-Based Analysis of Tissue-Specific Phosphorylation Sites. https://www.ncbi.nlm.nih.gov/pubmed/27332813

Intrinsically Disordered Side of the Zika Virus Proteome. https://www.ncbi.nlm.nih.gov/pubmed/27867910

Origin of a folded repeat protein from an intrinsically disordered ancestor. https://www.ncbi.nlm.nih.gov/pubmed/27623012

setting up parameters of the analysis

=========================================================

Species to select from: “Homo sapiens”, “Mus musculus”, “strain ATCC 204508”, “strain K12” Paste value in the code chunk below

all Uniprot if reviewed == 1, only Swissprot data if reviewed == 2, TrEMBL data if reviewed == 3 only reviewed = 2 is relevant for this analysis Paste value in the code chunk below

To distinguish between isoforms or not (to use only generic UniprotACs): TRUE / FALSE? only isoforms = FALSE is relevant for this analysis Paste value in the code chunk below

Please specify the date for which you want to perform the analysis (if not today) “Logic table” from protein_properties script is needed for this analysis. Filename example:
“proteome_vs_interactome_protein_properties_f_Homo sapiens_reviewed_2_isoforms_FALSE_2016-12-01.txt”

the date of analysis, how old is Uniprot protein list: 2016-12-01

Species ID lookup

=========================================================

looking up species ID in the Uniprot readme file (always reads from url)

## [1] "UP000005640 9606    HUMAN     21031   71897   93381  Homo sapiens (Human)"
##   Proteome_ID SPECIES_ID
## 1 UP000005640       9606

Reading SQL database for information about the disordered proteins

=========================================================

read the table saved before - script can be started from here

merge % of protein disordered to interaction data

## [1] "disorder NAs: 83"
## [1] "CIDER data NAs: 83"

Results

Protein disorder

How does the fraction of disordered regions of protein influence it’s detection in protein interactions?

=========================================================

Protein disorder discribed by consensus across disordered region prediction algorithms (IUPRED, DisEMBL)

## Warning: Ignoring unknown parameters: draw_quantiles, scale

Protein disorder as discribed by the capacity to form interresidue contacts necessary for structural stability (IUPRED)

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

Protein disorder as discribed by missing X-ray crystallography coordinates (neural network prediction, DisEMBL)

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

Protein disorder as discribed by coils/loops content - excluding \(\alpha\)-helix, \(3_{10}\)-helix, \(\beta\)-sheets (neural network prediction, DisEMBL)

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

Protein disorder as discribed by hot-loops content (high \(\alpha\)-carbon temperature factor) - coils/loops excluding \(\alpha\)-helix, \(3_{10}\)-helix, \(\beta\)-sheets (neural network prediction, DisEMBL-H)

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

How physical and chemical properties of a protein influence it’s detection in protein interactions?

net charge per residue vs detection in protein interactions

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale
## Warning: Removed 24 rows containing non-finite values (stat_boxplot).

amino acid charge segregation vs detection in protein interactions?

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

fraction of charged residues vs detection in protein interactions

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

mean hydropathy vs detection in protein interactions

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale
## Warning: Removed 114 rows containing non-finite values (stat_boxplot).

proline content vs detection in protein interactions?

=========================================================

## Warning: Ignoring unknown parameters: draw_quantiles, scale

What is the relationship between hydropathy and protein disorder (consensus)?

=========================================================

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 107 rows containing non-finite values (stat_smooth).

What is the relationship between hydropathy and net charge per residue?

=========================================================

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 107 rows containing non-finite values (stat_smooth).

What is the relationship between hydropathy and the fraction of charged residues (CIDER)?

=========================================================

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 107 rows containing non-finite values (stat_smooth).

What is the relationship between hydropathy and amino acid charge segregation?

=========================================================

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 107 rows containing non-finite values (stat_smooth).

rlm - robust linear model - less sensitive to outliers

PCA

##            log10_Mass IUPRED_PD COILS_PD HOTLOOPS_PD REM465_PD
## A0A075B6P5   4.112504    0.0167   0.7667      0.2250    0.2500
## A0A075B6Q5   4.110287    0.0000   0.5085      0.3136    0.0678
## A0A075B6S6   4.121067    0.0000   0.8583      0.3500    0.1833
## A0A075B759   4.260000    0.0061   0.7012      0.4268    0.0488
## A0A087WTH1   4.073058    0.0000   0.2407      0.2593    0.2130
## A0A087WTH5   4.176901    0.1515   0.5227      0.3636    0.2121
##            ICHR_consensus_PD Frac_DP Hydropathy_m Hydropathy_U     FCR
## A0A075B6P5              0.00 0.60833      4.43750      0.49306 0.12500
## A0A075B6Q5              0.00 0.58475      4.55000      0.50556 0.16102
## A0A075B6S6              0.00 0.60833      4.29333      0.47704 0.14167
## A0A075B759              0.00 0.62805      4.09085      0.45454 0.25610
## A0A087WTH1              0.00 0.57407      5.01296      0.55700 0.21296
## A0A087WTH5              0.12 0.58333      4.18939      0.46549 0.19697
##                NCPR   Kappa   Delta Delta_max norm_Proline_content
## A0A075B6P5 -0.00833 0.22638 0.02242   0.09903            0.8272056
## A0A075B6Q5  0.00847 0.17790 0.02420   0.13601           -1.6374430
## A0A075B6S6  0.00833 0.25342 0.02948   0.11634            0.8272056
## A0A075B759  0.04878 0.15589 0.03683   0.23629           -1.0605201
## A0A087WTH1  0.00926 0.18928 0.03554   0.18779           -0.7847848
## A0A087WTH5  0.01515 0.15889 0.02791   0.17564            0.2163461
## List of 3
##  $ d: num [1:15] 902.1 194.8 69 34.5 19.6 ...
##  $ u: num [1:15, 1:15] -0.7341 -0.0352 -0.0976 -0.0505 -0.0277 ...
##  $ v: num [1:20129, 1:15] -0.00683 -0.00675 -0.00674 -0.00661 -0.00708 ...

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.10.0 MASS_7.3-45       scales_0.4.1      ggplot2_2.2.1    
## [5] dplyr_0.5.0       RSQLite_1.1-2    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8      knitr_1.15.1     magrittr_1.5     munsell_0.4.3   
##  [5] colorspace_1.3-2 R6_2.2.0         stringr_1.1.0    plyr_1.8.4      
##  [9] tools_3.3.2      grid_3.3.2       gtable_0.2.0     DBI_0.5-1       
## [13] htmltools_0.3.5  yaml_2.1.14      lazyeval_0.2.0   assertthat_0.1  
## [17] rprojroot_1.1    digest_0.6.11    tibble_1.2       reshape2_1.4.2  
## [21] memoise_1.0.0    evaluate_0.10    rmarkdown_1.3    labeling_0.3    
## [25] stringi_1.1.2    backports_1.0.4